Goto

Collaborating Authors

 deep episodic value iteration


[R] [1705.03562] Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning • r/MachineLearning

@machinelearnbot

One question though - why have you not directly try it on a standard RL like car pole or some of the Atari games etc... tbh the first time I hear about this Omniglot World task ( I know the dataset but never have seen it been using for RL)


Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

arXiv.org Machine Learning

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.